Statistical Machine Translation of French and German into English Using IBM Model 2 Greedy Decoding

نویسنده

Michael Turitzin

چکیده

The job of a decoder in statistical machine translation is to find the most probable translation of a given sentence, as defined by a set of previously learned parameters. Because the search space of potential translations is essentially infinite, there is always a trade-off between accuracy and speed when designing a decoder. Germann et al. [4] recently presented a fast, greedy decoder that starts with an initial guess and then refines that guess through small “mutations” that produce more probable translations. The greedy decoder in [4] was designed to work with the IBM Model 4 translation model, which, while being a sophisticated model of the translation process, is also quite complex and therefore difficult to implement and fairly slow in training and decoding. We present modifications to the greedy decoder presented in [4] that allow it to work with the simpler and more efficient IBM Model 2. We have tested our modified decoder by having it translate equivalent French and German sentences into English, and we present the results and translation accuracies that we have obtained. Because we are interested in the relative effectiveness of our decoder in translating between different languages, we discuss the discrepancies between the results we obtained when performing French-to-English and Germanto-English translation, and we speculate on the factors inherent to these languages that may have contributed to these discrepancies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A New Decoding Algorithm for Statistical Machine Translation: Design and Implementation

We describe a new algorithm for the Decoding problem in Statistical Machine Translation. Our algorithm is based on the Alternating Optimization framework and employs dynamic programming. The time complexity of the algorithm is O m , where m is the length of the sentence to be translated, which is the best among all known algorithms for the problem. As the search space explored by the algorithm ...

متن کامل

Squibs and Discussions: Decoding Complexity in Word-Replacement Translation Models

Statistical machine translation is a relatively new approach to the long-standing problem of translating human languages by computer. Current statistical techniques uncover translation rules from bilingual training texts and use those rules to translate new texts. The general architecture is the source-channel model: an English string is statistically generated (source), then statistically tran...

متن کامل

Decoding Complexity in Word-Replacement Translation Models

Statistical machine translation is a relatively new approach to the longstanding problem of translating human languages by computer Current statistical techniques uncover trans lation rules from bilingual training texts and use those rules to translate new texts The general architecture is the source channel model an English string is statistically gener ated source then statistically transform...

متن کامل

Stanford University's Submissions to the WMT 2014 Translation Task

We describe Stanford’s participation in the French-English and English-German tracks of the 2014 Workshop on Statistical Machine Translation (WMT). Our systems used large feature sets, word classes, and an optional unconstrained language model. Among constrained systems, ours performed the best according to uncased BLEU: 36.0% for French-English and 20.9% for English-German.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Statistical Machine Translation of French and German into English Using IBM Model 2 Greedy Decoding

نویسنده

چکیده

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

A New Decoding Algorithm for Statistical Machine Translation: Design and Implementation

Squibs and Discussions: Decoding Complexity in Word-Replacement Translation Models

Decoding Complexity in Word-Replacement Translation Models

Stanford University's Submissions to the WMT 2014 Translation Task

عنوان ژورنال:

اشتراک گذاری